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Abstract 

General Probabilistic Theories provide the most general mathematical framework for the 
theory of probability in an operationally natural manner, and generalize classical and quan- 
tum theories. In this article, we study state-discrimination problems in general probabilistic 
theories using a Bayesian strategy. After re-formulation of the theories with mathematical 
rigor, we first prove that an optimal observable to discriminate any (finite) number of states 
always exists in the most general setting. Next, we revisit our recently proposed geomet- 
ric approach for the problem and show that, for two-state discrimination, this approach is 
indeed effective in arbitrary dimensional cases. Moreover, our method reveals an opera- 
tional meaning of Gudder's "intrinsic metric" by means of the optimal success probability, 
which turns out to be a generalization of the trace distance for quantum systems. As its 
by-product, an information-disturbance theorem in general probabilistic theories is derived, 
generalizing its well known quantum version. 

1 Introduction 

1.1 Background 

Among many attempts to understand quantum theory axiomatically, an operationally natural 
approach for the general theory of probability, recently referred to as general probabilistic the- 
ories (or generic probabilistic models), has been studied [HI \9\ [TTl I16j and has attracted much 
attention in the recent development of quantum information theory (e.g., [HElllS]). Such an ap- 
proach provides a unified mathematical framework that involves not only classical and quantum 
theories but also more general settings that would be candidates of possible future extensions 
of the present quantum theory. One of the motivations of such an approach is to understand 
quantum mechanics better by introducing various viewpoints especially with information theo- 
retic point of view. Another motivation to investigate such a general theory has arisen recently 
from research on quantum information theory including quantum information security. Among 
recent development of information theory and information security, one of the greatest impacts 
was provided by Shor's discovery [19j of an efficient (i.e., polynomial-time) integer factoring 
algorithm for quantum computers that reveals a future practical threat against several stan- 
dard cryptosystems in the present time, such as RSA cryptosystem [l7j. This history suggests 
a non-negligible possibility that any cryptosystem with security based on the present physical 
theory, even quantum theory, may fall into insecure once a further advanced physical theory is 
discovered and applied to information technology. Hence a study of possible extensions of the 
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present physical theory is of importance and interest from not only theoretical but also practical 
standpoints. 

One of the most important aims of studying general probabilistic theories is to determine 
which characteristics are typical for classical or quantum systems and which are not. For ex- 
ample, in a recent article [1] Barnum et al. investigated cloning and broadcasting of states in a 
general probabilistic theory. They proved (in finite-dimensional cases) that universal cloning or 
universal broadcasting is possible only for classical systems, which generalizes the No-Cloning 
Theorem and the No-Broadcasting Theorem for quantum systems [21 [6l \20\ [2T] . Another exam- 
ple relevant to our present work is our recent study [13] on minimum-error state discrimination 
problems in general probabilistic theories (in this article the word "minimum-error" is omit- 
ted since we do not discuss other kinds of discrimination problems such as unambiguous state 
discrimination). State discrimination problems have been well investigated for quantum sys- 
tems (e.g., [31 Uni [l2l [22] ) , but optimal success probabilities to discriminate given states and 
the corresponding optimal measurements were determined only in very restricted cases such 
as two-state cases. In p!3] we gave a formulation of state discrimination problems in finite- 
dimensional general probabilistic theories, and introduced from a geometric viewpoint a class 
of special ensembles of states called Helstrom families: We showed that the optimal success 
probability can be determined by a Helstrom family if it exists. For the existence, we have 
discussed only for two-state cases and some other cases of states with symmetric configuration, 
and it has been shown that a Helstrom family always exists for both classical and quantum 
systems in any "generic" case (specified in a certain well-defined manner). However, existence 
of Helstrom families in more general (neither classical nor quantum) cases has not been clarified. 
The main aim of this article is to study the existence problem of the Helstrom family in general 
probabilistic theories with arbitrary dimension that are neither classical nor quantum. 

1.2 Our contributions and organization of the article 

In Sect. [21 we summarize a mathematical framework for general probabilistic theories. Fol- 
lowing several preceding works for general probabilistic theories (e.g., [H [HI [HI [I3l [HI [T6]). 
our formulation is based on the notions of states, effects and observables, as well as the notion 
of probabilistic state ensembles. Namely, we regard the state space as a "convex structure" 
[9]. A standard argument shows that the associated "separated" state space is embedded as a 
convex subset 5 in a real vector space V. For the sake of minimality, we assume that V is the 
affine hull of S and the topology of V is the weak topology generated by all effects on S. We 
emphasize that S is usually assumed to be compact with respect to this topology, but in the 
present article compactness is not assumed to keep the most generality of our setting. In fact, 
when S is not compact with respect to this topology, we further take a "virtual state space" 
S D S and a "virtual underlying space" V D V such that 5 is a compact convex subset of V 
and some additional conditions are satisfied (see Theorem 12.11 for the precise statement): 

dy{S) = S C V 

u u 

'S^=SQ/r^ ~ S (Z V 

By those properties, the objects V , S and V are uniquely determined by 5, called the minimal 
framework. See Appendices [XHEl for further technical details. Now the effects on the "real" state 
space S are in one-to-one correspondence to their continuous extensions to <S, called "virtual 
effects". A similar correspondence exists between observables on S and "virtual observables" 
on S. Moreover, for each "virtual state" s E 5 \ 5, any e > and any observables Oi, . . . , O^, 
there exists a "real state" s € 5 such that the results of measurements of these Oj at 's are 
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within e-error from the resuhs at s; physically, this means that virtual states and real states are 
indistinguishable by experiments. Note that, in finite-dimensional cases, the underlying space 

V is always isomorphic to a finite-dimensional Euclidean space and now S is nothing but a 
bounded convex subset of the Euclidean space V. 

In Sect. [3l we give a natural formulation of state discrimination problems in general prob- 
abilistic theories by following our preceding work [13j. Our present formulation coincides with 
the preceding one when S is compact. Moreover, we show that an optimal observable always 
exists for discrimination of any (finite) number of given states with arbitrary a priori occur- 
rence probabilities (see Theorem 13. ip . Although it would be possible to interpret this result as 
a special case of a general theorem by Ozawa [16], we include the proof in this article for the 
reader's convenience because of its simplicity. (The proof uses only the existence theorem of 
maximum values of continuous functions on compact spaces and some elementary arguments 
for topological spaces.) Note that the argument in Sect. [3] is closed within the real state space 
S, therefore the additional notions such as virtual states and virtual observables are not yet 
needed. 

In Sect, m we introduce the notion of (weak) Helstrom families by translating the definition 
given in |13j to our minimal framework. A weak Helstrom family yields an upper bound of 
the optimal success probability for discriminating given states, while a Helstrom family yields 
the tight bound. A sufficient condition for a weak Helstrom family to be a Helstrom family 
has been given [13]. As a consequence of the above-mentioned existence theorem of an opti- 
mal observable, we show that the above sufficient condition is also necessary, except for the 
meaningless cases called non-generic cases. (By definition, generic cases are the cases where 
there exists a discrimination strategy better than simply outputting the candidate state with 
highest a priori probability.) In two-state cases, the above necessary and sufficient condition 
turns out to be "distinguishability" of two (possibly virtual) states ti,t2 associated to a given 
weak Helstrom family, therefore the problem of finding a Helstrom family is reduced to a study 
of distinguishable (virtual) states. 

Finally, in Sect.[5]we prove that a Helstrom family for two-state discrimination always exists 
in generic cases (see Theorem 15. 3p . hence in such a case the optimal success probability can be 
determined (at least in principle) by just finding a Helstrom family. Our argument works in a 
general case of arbitrary dimension that may be neither classical nor quantum. Owing to the 
result, we also give a simple criterion for generic cases among all two-state cases (see Theorem 
15. 4p : Given two distinct candidate states si,S2 € S with positive a priori probabilities pi,P2, the 
case is non-generic if and only if we have pi ^ p2 and an element s* = {pisi — P2S2) / (pi —P2) of 

V lies outside the state space S. In particular, the equiprobable cases pi = p2 = 1/2 are always 
generic, therefore in such cases we are always able to discriminate (at least in principle) given 
states with probability higher than 1/2. Moreover, our result also reveals a relation of Gudder's 
distance between two states si, §2 € 5 [9j with the optimal success probability of discriminating 
si and S2 in equiprobable cases, and also an operational meaning of Gudder's intrinsic metric 
[9] that gives an operationally natural generalization of the trace distance for quantum systems 
to general probabilistic theories (see Remark 15. ip . As an application, a simple (qualitative) 
version of the information disturbance theorem in general probabilistic theories is shown to be 
hold that generalizes the corresponding theorem in quantum theory. 
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2 A Mathematical Framework for General Probabilistic Theo- 
ries 



In this section, we introduce a mathematical framework for general probabilistic theories. In 
this article, any vector space is defined over the real field M unless otherwise specified. 

Following the preceding works [U El [HI [I3l [HI [16] , we start with a set Sq of states, called 
a state space, that is a convex structure [9] in the following sense: For two states s,t G Sq and 
two weights A, /U > such that X + fj, = 1, a state {X,n;s,t) G Sq called an ensemble of s,t 
with weights A, fi is uniquely determined. Physically, (A, fi; s, t) means the probabilistic state 
ensemble of s and t with a priori probabilities A and fi. We regard any convex subset of a vector 
space as a convex structure with a natural operation (A, fi; s, t) = Xs + fit. Note that any other 
postulate for the operation (A, fi; s, t) is not required; some natural properties of state ensembles 
will be induced by construction of the associated "separated" state space presented below. 

For any convex structure C , we say that a functional / : C — > M on C is affine if we 
have f{{X,n;s,t)) = Xf{s) + fif{t) for any s,t € C. Let £{C) denote the set of all affine 
functionals e on C with image e(C) contained in the unit interval [0, 1] in M. Then we call each 
e € £{So) an effect on Sq. Now we define an equivalence relation ~ on Sq by setting s ~ t 
if and only if e{s) = e{t) for every e G £{So). Let s denote the equivalence class of s € Sq. 
Then the quotient set 5o = 5o/~ is also a convex structure with {X,fi;s,t) = {X,fi;s,t) for any 
's,t € Sq. a physical interpretation is that, as two equivalent states (in the above sense) are 
statistically indistinguishable for any effect, we would have no physical way to distinguish those 
states. (See below for the definition of observables composed of effects.) Now each e G £"(150) 
induces an effect e G f(5o) on <So by e{s) = e{s) for each s € Sq, and this defines a one-to-one 
correspondence between £{So) and £{Sq). Moreover, the definition of the set Sq implies the 
following property (see e.g., [9l [TTj [T6]): 

Lemma 2.1. The convex structure Sq is separated, in the sense that for any distinct s,t £ Sq, 
there exists an effect e G £{Sq) such that e{s) ^ e{t). 

The next theorem presents our framework involving the separated state space Sq. To our 
framework we intend to introduce as few mathematical structures as possible subject to physi- 
cally natural requirements; we call the resulting framework a minimal framework. Here we use 
the notion of topological vector spaces; we refer to the book [TH] for theory of topological vector 
spaces together with some relevant topics in general topology. In what follows, we abbreviate 
"topological vector space" to "t.v.s.", and "locally convex" to "I.e.". For any t.v.s. W, let 
Cc{W) denote the set of all continuous linear functionals W ^ M. Moreover, let T{X) denote 
the topology on a set X if it is clear from the context. Then the above-mentioned theorem on 
our minimal framework is the following: 

Theorem 2.1. Given a separated convex structure Sq as above, there exist the following objects: 

• a I.e. Hausdorff t.v.s. V (overW); 

• a convex subset SofV such that V is the affine hull Aff(5) of S; 

• a topological vector subspace VofV that is dense in V ; 

• a convex subset SofV such that Aff(5) = V, 
satisfying the following conditions: 

• S is isomorphic to Sq, in the sense that there exists a bisection (p : Sq ^ S such that 
ip{{X, fi; s, t)) = Xip{.s) + fJ'^{t) for any s,t G Sq; 
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• the topology T{V) of V is a weak topology, i.e., the topology with minimal family of open 
subsets to make every f E CdV) continuous; 

• the induced topology on S is the weakest to make every e G £{S) continuous; 

• the induced topology on V is the weakest to make every linear functional f : V ^ M, such 
that f{S) C M is bounded, a continuous map; 

• S is the closure dy{S) of S in V, and S is compact and complete. 

dy{S) = S C V 

u u 
S^ = So/r^ ~ 5 c y 

Moreover, these objects are unique; namely, for another collection S' , V , S' and V of such 
objects, there exists an affine isomorphism V ^ V' that is a homeomorphism and maps each of 
S, V and S onto the corresponding object. 

A proof of Theorem 12.11 will be given in Appendices IXHEI 

Remark 2.1. In finite-dimensional cases (dim 5 = n < oo), the space V above is isomorphic to 
an n-dimensional Euclidean space (cf., Theorem 15. ip . and we have V = V and S = c\v{S). 
Hence in such cases, the state space S is nothing but a bounded convex subset of M". Moreover, 
in this case every e € £{S) is continuous by the definition of T(y) = T{V); however, the 
continuity is not guaranteed in a general case. 

Definition 2.1. We call the sets S, S, V, and V a (real) state space, a virtual state space, a 
(real) underlying space, and a virtual underlying space, respectively. We call s G S a (real) state 
and s € 5 \ 5 a virtual state. Moreover, we call each e (z £ = £{S) a (real) effect on S, and 
each e € £{S) a virtual effect on S if it is continuous. Let £ denote the set of the virtual effects 
on S, i.e., £ = {e £ £{S) \ e is continuous}. 

The choice of T{S) is motivated by a physical intuition that any available information on the 
state space S would be obtained via statistical properties of effects on S. On the other hand, 
the continuity of virtual effects are required to ensure the following correspondence between 
effects and virtual effects: 

Lemma 2.2. Each effect e £ £ on S has a unique continuous affine extension e : 5 ^ M, and 
we have e € f . This gives a bijection e e from £ to £. 

Proof. Only the nontrivial part is the existence of a continuous affine extension e of e with 
e £ £; the uniqueness then follows since S is dense in S. First, since Aff(5) = V, the effect 
e extends to an affine functional f : V ^ M. Let a be the value of / at the origin of V; 
therefore f' = f — a: V ^Mis linear. Note that f'{S) C [—a, 1 — a], therefore /' is continuous 
on V by the property of V in Theorem 12. 1[ By a consequence of Hahn-Banach's Theorem 
(Theorem ID.ip . this /' extends to a continuous linear functional g on V. Now g{S) C c1ik(5(4S)) 
since S = cly{S), while g{S) C [—a, 1 — a] since g is an extension of /'. Thus the restriction 
e = {g + a)\g of g + a to S is a continuous affine functional such that e{S) C [0, 1], therefore 
e £ £. This e is the desired extension of e. □ 

Moreover, the sets S and S have the following properties: 

Lemma 2.3. Both S and S are separated, which (for S ) means that for any distinct s,t £ S, 
there exists an e £ £, not just e £ £{S), such that e(s) / e{t). 
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Proof. Since V is Hausdorff, S is separated (in the sense of Lemma I2.1|) by the definition of 
T(5); see Theorem 12. II On the other hand, let s,t be distinct elements of S. Then, since T{V) 
is Hausdorff and a weak topology (see Theorem 12. there exists a continuous linear functional 
f on V such that /(s) ^ f{t). Now f{S) C M is bounded since S is compact, therefore the 
restriction e of an appropriate affine transformation af + /? of / to 5, where a, /3 E M, is a 
virtual effect such that e(s) ^ e(t). Hence Lemma 12.31 holds. □ 

Definition 2.2. An A^-valued (real) observable (or virtual observable, respectively) is a col- 
lection O = {ei)^i of N effects Cj € <S (or N virtual effects Cj G £, respectively) such that 
"^i^i ei = 1. Let Oat and Ojy denote the sets of all A^- valued observables and of all A^- valued 
virtual observables, respectively. 

Physically, for each observable O = (ej)j and each s € S, the quantity ej(s) means the 
probability to obtain i-th output when measuring O at the state s; the condition Y^ - = 1 
is required by a property of probability. On the other hand, the affine property of each Cj 
is motivated by a natural expectation that the output probabilities for a probabilistic state 
ensemble would be weighted sums of those probabilities for each of the original state. The same 
also holds for virtual observables. Now we have the following correspondence: 

Lemma 2.4. We have O = (ei)j € On for any O = (ej)i £ On. This gives a bijection O i-^ O 
from On to On- 

Proof. Only the nontrivial part is to show that = 1 for any O = (ej)j S On- This 

follows from the uniqueness property in Lemma 12.2^ since both - and 1 are continuous 
affine extensions of the effect = 1 to 5. □ 

By virtue of Lemma 12.41 the output probabilities for virtual observables at virtual states 
can be derived (at least in principle) from information on real observables at real states. On 
the other hand, for any finite collection of measurements with non-ideal accuracy, virtual states 
are indistinguishable from real states (in the sense mentioned in Sect. II. 2p . 

Note that our framework presented above does in fact not concern every feature of quantum 
theory, e.g., transformations of states possibly caused by measuring observables. However, our 
framework is still enough for our current purpose of studying state discrimination problems. 

Obviously, two fundamental examples of general probabilistic theories are given by classical 
and quantum theories, as follows (taken from |13j): 

Example 2.1. A finite classical system described by a finite probability theory with finite sample 
space {uji, . . . ,uin} is formulated in our model as the (n — l)-dimensional standard simplex 
S = {p = {pi, . . . ,pn) G M" \ Pi > 0, YliPi — Namely, each state is a probability distribution 
over the sample space, and it can be seen as a probabilistic ensemble of "pure states" p'-*^ 
i = 1, . . . ,n, with only one possible output a;,, that are extremal points of S in usual sense. 
Note that in this example S itself is compact, hence all states are real. This example can be 
naturally extended to infinite-dimensional classical systems. 

Example 2.2. In quantum theory, a quantum state is described by a density operator p, that 
is a positive operator on a Hilbert space Ti with unit trace. Thus the state space is a convex 
subset of the vector space of all linear operators on TL. Moreover, an effect e is described [TB] 
by a positive bounded operator B such that < B < I^-i via the relation e{p) = trBp, that is 
an element of positive operator valued measure (POVM). 

In the last of this section, we give two remarks on relations with preceding works. Before 
starting the remarks, note that assumptions on compactness of the state space S and on com- 
pleteness of S are equivalent to each other, since each of the two implies that S is closed in V 
and hence S = S. 
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Remark 2.2. In a recent work by Barnum et al. [1], their finite-dimensional state space is 
assumed to be compact to guarantee that the state space is the closed convex hull of the set 
of "pure states" (i.e., extremal points of the state space). Owing to Krein-Milman's Theorem 
(see e.g., Theorem 10.4 in [111 Chapter II]), the same property is possessed by our (possibly 
infinite-dimensional) virtual state space S. Thus it is very attractive to start our argument by 
choosing the compact set <S as a new "state space" instead of S. However, such a modification 
does decrease the generality of our framework. Namely, it is not guaranteed in general that 
every e € £{S), that should be a new "effect" in the above modification, is continuous with 
respect to the original topology of <S. Thus to ensure that every "effect" is continuous, we need 
a new topology stronger than the original, therefore the set <S may fail compactness with respect 
to the new topology. Hence the advantage to choose iS as a state space disappears. 
Remark 2.3. In another previous work by Gudder [9], the following distance d{s,s') of two 
states s,s' £ S was introduced to make S a metric space. Namely, Gudder defined d{s,s') to 
be the infimum of the values < A < 1 such that At -|- (1 — X)s = Xt' -|- (1 — X)s' for some 
states t,t' G S. Since this relation implies that (1 — A)|e(s) — e(s')| = A|e(t) — e{t')\ < A for 
any e (z £, every effect is continuous with respect to the metric d on S. However, unless S is 
finite-dimensional, the metric d is not necessarily continuous with respect to the topology of 
S specified in Theorem 12.11 This is roughly because, for a state s G 5 and any collection of a 
finite number of effects , the metric d is not necessarily bounded by a sufficiently small value 
on the intersection of hyperplanes containing s defined by the affine functionals e^. Thus our 
topology on S is weaker than (or equal to) the topology defined by the metric d. Moreover, 
another relation of our results with Gudder's metric functions will be mentioned later (Remark 

EH). 

3 State Discrimination Problems 

In this section, we give a formulation of (minimum-error) state discrimination problems in 
general probabilistic theories based on the minimal framework introduced in Sect. O This 
formulation is a natural generalization of state discrimination problems for quantum systems, 
and in fact a naive translation of our preceding formulation \13\ to the present more general 
setting. 

In the state discrimination problem, we are given a finite number (say N) of real states 
si, . . . , Sat G S and the corresponding a priori probabilities pi, . . . ,pN, Pi > 0, YliPi — 1- To 
avoid inessential intricacy, we assume that each probability pi is positive. Then for each A^- 
valued observable O = {ei)i G On, we define the success probability Psucc(O) for the observable 
Oby 

TV 

-Psucc(Q) = y^Pieijsi) . (1) 
j=l 

Namely, when measuring the observable O at an unknown state that is chosen from si, . . . , sat 
with probabilities pi, . . . ,pN (thus the unknown state is regarded as the probabilistic ensemble 
J2iPi'^i ^ '^)' ^"th output for O corresponds to the guess that the chosen state was originally 
Si. (Without loss of generality, it suffices to consider A^- valued observables when discriminating 
N states.) Our aim is to make the success probability as high as possible. The optimal success 
probability Psucc is obviously defined by 

-Psucc = sup Psucc(O) , (2) 

oeOjv 

and an observable O G Om is called optimal if it attains the supremum, namely: Psucc(O) = 
Psucc- However, it is nontrivial whether or not an optimal observable exists in each case. Ozawa 
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[16| has proven existence of Bayes optimal measurements under somewhat different formulation. 
The existence theorem also holds in our situation. Here we present the theorem together with 
its proof that is significantly simpler than the one in [H], as follows: 

Theorem 3.1. The supremum in the right-hand side of ([2]) is attained by an observable O G 
Otv- Hence an optimal observable always exists. 

The rest of this section is devoted to the proof of Theorem [XTl note that in this proof, virtual 
states do not appear at all. The outline is the following: With respect to a certain topology, 
the set Ojy of A^-valued observables is compact and the map On ^ M, O i— > Psucc(O), is 
continuous, therefore this map takes the maximum value at some O € Oat. Now we introduce 
a map i : £ —>■ [0,1]'^ from £ to the direct product [0,1]"^ = OsGsiOjlls °f copies [0,1]^ of 
the unit interval [0, 1] over all s G 5 by t(e) = (e(s))<je5 for any e G £. Then l is injective, 
therefore £ is identified with the topological subspace i{£) of the product space [0,1]*^. By 
the definition of product topology, T([0, l]"^) is the weakest topology to make every projection 
TTg : [0, 1]*^ [0, 1]^ (s G S) continuous. Thus the topology on £ induced by the identification is 
the weakest to make every "evaluation map" ev^ : £" ^ [0, 1], evs(e) = e(s) (s G S) continuous. 
Now the following holds: 

Lemma 3.1. l{£) is a closed subset of [0, 1]"^. 

Proof. For each s,t £ S and < A < 1, put s' = As + (1 — X)t G S, and let 

As,t,X = {/ G [0, if I TTsif) - XTTsif) - (1 - X)7Tt{f) = 0} . 

Then As^t,x is a closed subset of [0, 1]"^, since the function tTs' — AtTs — (1 — A)7ri on [0, 1]"^ is 
continuous. Moreover, the affine property of the effects implies that l{£) is the intersection of 
all the subsets As,t,x- Hence i{£) is also closed in [0, 1]*^, therefore Lemma [3T] holds. □ 

By Tychonoff 's Theorem, the product space [0, 1]"^ is compact, therefore £ is also compact 
with respect to the above topology by Lemma [TTl Thus the product space £^ is also compact 
owing to Tychonoff 's Theorem again. Moreover, a similar argument implies that the subset On 
of £^ is closed in £^ , since the map £^ M, {ei)^^ i— > l^,^iei(s), is continuous for every 
s £ S. Thus On is also compact. Finally, with respect to the topology on On, the above 
function O i— > Psucc(O) on On is continuous. Hence the proof of Theorem 13.11 is concluded. 

4 Helstrom Families 

In Sect. El we have seen that an optimal observable to discriminate given states always exists 
in general probabilistic theories. In the quantum cases, the state discrimination problem has 
been intently investigated (e.g., [3l [lOl [El [22] ) , but strategies for attaining optimal solutions 
have been well established only in restricted cases such as two-state cases (cf., [10]) and some 
symmetric cases (cf., [3]). To study this problem in general probabilistic theories, our preceding 
work |13j introduced and studied the notion of "(weak) Helstrom families" from a geometric 
viewpoint. In this section, we give a translation of the preceding formulation to our minimal 
framework. 

Recall that we are given N states Si £ S with a priori probabilities pi > 0, ^^Pi = 1. Then 
the definition of weak Helstrom families is the following (cf., Definition 1 in [13j): 

Definition 4.1. We call a family of N ensembles {pi, Si] l—pi,ti), i = 1, . . . , N , a weak Helstrom 
family, if there exist a quantity p > maxiPi called a Helstrom ratio, N real or virtual states 
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ti S, i = 1, . . . , N called conjugate states to Si, and a real or virtual state s € S called a 
reference state, such that 

Pi 

PiSi + (1 - Pi)ti = s , with <pi = — <1 (3) 

p 

for every i. We call a weak Helstrom family trivial when p > 1, and nontrivial when p < 1. 

Example 4.1. In Fig. [H we consider the case = 3 and pi = 1/3 {i = 1,2,3). The three 
states ti,t2,t3 are in such positions that their configuration is similar to that of si, 82,83 with 
respect to the center s of similarity, with similarity ratio tis/JiS = 2/1. Now these form a weak 
Helstrom family with pi = 2/3, therefore the Helstrom ratio is p = Pi/pi = 1/2. Note that 
any other similar configuration with a larger similarity ratio gives a weak Helstrom family with 
larger pi, hence with a smaller Helstrom ratio. 




Figure 1: Example of weak Helstrom family 

In the original paper p3j, a weak Helstrom family was required to satisfy an additional 
condition p < 1, but here we relax this condition to simplify the argument. Note that a 
trivial weak Helstrom family with Helstrom ratio p = 1 always exists, by taking conjugate 
states = (1 — Pi)~^ "l^j^iPj^j ^^'^ ^ reference state s = J2iPi^i- Example 14.11 suggests that, 
intuitively, some nontrivial weak Helstrom families can be found as well by taking the states 
ti with larger configuration (cf., [13]). An importance of weak Helstrom families in a study of 
state discrimination problems is implied by the following property that has been proven in [13] 
under the framework there: 

Proposition 4.1 (cf., Proposition 1 in [13] )• For any weak Helstrom family with Helstrom ratio 
p, we have Psucc < V for the optimal success probability. 

Proof. The idea of proof is essentially the same as p3]. For any observable O = (ej)j € On 
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with the corresponding virtual observable O = {ei)i G On (see Lemma [23]) , we have 

N 

1 = ^ei{s) (using = 1) 

i=l 

= eiipiSi + (1 - Pi)ti) (using ([3])) 

i 

= '^Pieiisi) + ^(1 -Pi)ei{ti) (using G 5) 
X] ^ei(si) + ^(1 - pi)ei(ti) (using @) 

i 

X(l-p,)ei(ii) (using©). 



P 

Psucc(O) 



Since Pi < 1^ the second term of the last row is nonnegative, therefore we have -Psucc 

(O) < p for 

any O G Oat. Hence Proposition 14.11 holds. □ 

Note that the bound Psucc ^ P given by Proposition 14.11 is meaningless when the weak 
Helstrom family is trivial. Thus only the weak Helstrom families that are significant for our 
purpose are the nontrivial ones. Now it was mentioned in Example 14.11 that changing the 
configuration of conjugate states to larger one makes the Helstrom ratio smaller, hence makes 
the bound given by Proposition 14.11 closer to the tight one. We are interested in whether or not 
the tight bound can be achieved just by this strategy. Owing to the observation, a notion of 
"Helstrom families", that is a special subclass consisting of "optimal" weak Helstrom families, 
was introduced in [TH] : 

Definition 4.2 (cf.. Definition 2 in y^). We call a weak Helstrom family a Helstrom family if 
the Helstrom ratio p attains the tight bound: p = Pgucc- 

If a Helstrom family exists, then we can determine (at least in principle) the optimal success 
probability by only searching (weak) Helstrom families by a certain (for example, geometric) 
method. However, existence of Helstrom families has been proven in the original work [T3] only 
for some restricted cases. In this article, we investigate existence of Helstrom families in more 
general situations. 

For this purpose, it is worthy to study conditions for a weak Helstrom family to be a Helstrom 
family. For one direction, a sufficient condition has been given in p3J- Here we prove the same 
result under the present framework: 

Proposition 4.2 (cf.. Proposition 2 in [13]). A sufficient condition for a weak Helstrom family 
{pi, Si] 1 —pi,ti), i = 1, . . . , N , to be a Helstrom family is that there exists O = (e^)^]^ G On such 
that ei{ti) = for every i. Moreover, such an observable O is optimal (if exists): Psucc(O) = 
p 

^ succ • 

Proof. The idea is again the same as [13j. For such an observable O, the argument in the proof 
of Proposition 14.11 implies that 



1 = + ^(1 - P^)e^{t^) = 



hence Psucc(O) = p. Now we have Psucc(O) < Pgucc < P — -fsucc(O) by Proposition|4ill therefore 
-fsucc(O) = Psucc = P- Hence Proposition 14.21 holds. □ 
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Again, Helstrom families are closely related to optimal state discrimination via Proposition 
14.21 In the special case of two-state discrimination (i.e., = 2), the above condition is rephrased 
as follows. Here we use the following terminology: 

Definition 4.3. Two real or virtual states ti,t2 G S are said to be distinguishable if there 
exists an e G £^ such that e{ti) = 1 and e(t2) = 0, i.e., the virtual observable (1 — e, e) G O2 
discriminates ti and t2 with certainty. 

Then the rephrased condition is the following: 

Corollary 4.1 (cf., Theorem 1 in [13j). Let (pi, sf, 1 —pi,ti), i = 1,2, be a weak Helstrom family 
for two states si,S2 G S with a priori probabilities pi,P2- If the conjugate states ti and t2 are 
distinguishable, then this weak Helstrom family is a Helstrom family. Moreover, an optimal 
observable O is given by an effect e with the corresponding virtual effect e distinguishing ti and 
t2: = (l-e,e). 

Now owing to the existence of an optimal observable (Theorem 13. we obtain a "converse" 
of the above facts. To state the result precisely, we recall the following notion introduced in 

m- 

Definition 4.4 ([13]). By generic case we signify any case in which the optimal success prob- 
ability satisfies Psucc > maxjpj, and by non-generic case we signify any of the remaining cases, 
i.e., Psucc = maxjpj. 

This definition means that, in non-generic cases, an optimal observable is always given by 
the trivial one that always returns i-th output with the index i determined by pi = maxj pj ; 
namely, we always guess that a given state would be the most frequent Sj. Hence the state 
discrimination problem is nontrivial only in generic cases. Now we give the following result 
stating that the sufficient condition in Proposition 14.21 is also necessary in generic cases: 

Proposition 4.3. Let (pi, Sj; 1 — pi,ti), i = 1, . . . ,N, be a Helstrom family. Then, in generic 
cases, an optimal observable O = {ei)i G On for discriminating given states satisfies ei{ti) = 
for every i. 

Proof. For any optimal observable O = (ej)i, since Psucc(O) = Psucc = P, the argument in 
Proposition 14.11 implies that 

1 = ^^^^ + V(i - mik) = 1 + V(i - p.Mt,) , 

p ^-^ ^-^ 

i i 

therefore — Pi)e'i{ti) = 0. Thus we have either pi = 1 for some i, or ei{ti) = for every 

i. Now a Pi = 1, then Pgucc = p = Pi/ Pi = Pi, contradicting the assumption that we are in a 
generic case. Hence Proposition 14.31 holds. □ 

Corollary 4.2. Let {pi,Si;l —pi,ti), i = 1,2, be a Helstrom family for two states si,S2 with 
a priori probabilities pi,P2- Then, in generic cases, the conjugate states ti and t2 are distin- 
guishable by a virtual effect e € £ corresponding to an optimal observable O = (1 — e, e) G O2 
for discriminating the states si and S2. 

Proof. By Proposition l4.3l an optimal observable O = (1 — e, e) G O2 satisfies that (1— e)(ti) = 
and 6(^2) = 0, therefore e(ti) = 1. □ 
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5 Existence of Helstrom Families for Two-State Cases 



In Sect, m we have presented some properties of (weak) Helstrom families for A^-state cases. 
However, existence of Helstrom families has not been clarified so far. In this section, we inves- 
tigate existence of Helstrom families particularly in two-state cases, i.e., N = 2. Note that our 
argument in this section works in a general setting, not necessarily classical or quantum, and 
also is not restricted to finite-dimensional cases. 

Throughout this section, fix states si, S2 G 5 and a priori probabilities pi,P2- For simplicity, 
we assume that si ^ S2 and pi > p2 by symmetry. Any (weak) Helstrom family in this section 
is for si,S2 and pi,p2 unless otherwise specified. 



5.1 A condition for generic cases 

In this subsection, we present a condition for generic cases for later use. First we introduce an 
element s* of V that plays a significant role in our following argument. Recall that we have 
assumed pi > P2- If Pi > P2, then define 

* PISI-P2S2 , Pi ( X 

s = = Sl^ (si - S2) • 

Pi -P2 Pi- P2 

Note that s* ^ V since si and S2 are real states, therefore we have s* S 5 if and only if 
s* G c\v{S). Then the aforementioned condition is the following: 

Lemma 5.1. 1. If the following condition 

either pi = P2, or pi > p2 and s* ^ S (4) 

is satisfied and a Helstrom family exists, then it is a generic case. 

2. If pi > P2 and s* € S, then it is a non-generic case. 

Proof. Note that for any Helstrom family (pi, sf, 1 —pi, ti), i = 1, 2, it is a non-generic case if and 

only if pi = 1 (since pi > P2). Now if pi = p2 and a Helstrom family exists, then pi = 1 implies 

that P2 = 1 and s = si = S2 (see ([3])), contradicting the assumption si 7^ S2. If pi > P2-, s* ^ S 

and a Helstrom family exists, then pi = 1 implies that pi = p = P2/p2, s = si = P2S2 + (1 —P2)t2 

and _ _ 

S1-P2S2 P1S1-P1P2S2 * , 

t2 = —. ^ = — = s ^ S , 

1 - P2 Pi- P1P2 

a contradiction. Thus the first part of the lemma hold. For the second part, if pi > p2 and 
s* € S, then we have si = (1 — P2/pi)s* + {p2/pi)s2 by the definition of s*, therefore for any 
O = (1 — e, e) G ©2 we have 

-Psucc(O) = pi{l - e{si)) +P2e{s2) 

= Pi - Pi ( ( 1 - — ) e(s*) + — e(s2) ) + P2e{s2) 

VV Pi/ Pi / 

= Pi - (Pi -P2)e(s*) . 

Since s* £ S, we have e(s*) > 0, therefore i-*succ(0) < pi for any O € 02- This means that it is 
a non-generic case. Hence Lemma l5 . 1 1 holds . □ 



Owing to this lemma, in what follows we assume that the condition ^ in Lemma 15.1 
is satisfied unless otherwise specified, in order to focus on generic cases. In the following 
subsections we will prove that a Helstrom family always exists under the assumption that 
is our main result in this article. 
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5.2 Auxiliary results 



In this subsection, for later use we summarize some known facts for topological vector spaces, 
together with some further properties. Our main reference is the book [18]. See also Sect. [2] for 
terminology. 

First we list the following (special cases of the) facts presented in |18j: 

Theorem 5.1 (Theorem 3.2 in \18\ Chap. I]). Any n- dimensional HausdorjJ t.v.s. with n < oo 
is isomorphic to the n-dimensional Euclidean space M". 

Proposition 5.1 (Proposition 3.3 in [18k Chap. I]). Let W be a t.v.s. IfW is a linear subspace 
of W that is closed in W , and W" is a finite- dimensional linear subspace of W , then W' + W" 
is closed in W . 

Proposition 5.2 (Proposition 3.4 in Chap. I]). Every linear functional on a finite- dimensional 
Hausdorff t.v.s. is continuous. 

The next theorem is a variant of Hahn-Banach's Theorem. Here we use the following notion: 
A real-valued function g on a, vector space W is called a semi-norm if we have g{x + y) < 
g{x) + g{y) for any x,y € W and we have g{Xx) = \X\g{x) for any x G W and A G M. Then we 
have the following theorem: 

Theorem 5.2 (Theorem 3.2 in [18\ Chap. II]). Let W be a vector space, g a semi-norm on W , 
and W a linear subspace ofW. If f is a linear functional on W such that |/(x)| < g{x) for all 
x € W' , then f extends to a linear functional f on W such that \ f{x)\ < g{x) for all x G W . 

A subset C of a vector space W is called circled if x G C and — 1 < A < 1 imply Ax G C; 
and called radial if for any x G W ., there exists Aq G M such that x G AC whenever |A| > |Ao|. 
If C is convex, radial and circled, then the Minkowski functional (or gauge) gc" : — > R of C 
is defined by 

gc{x) = inf{A > | x G AC} for each x G . (5) 

Proposition 5.3 (Proposition 1.4 in [18^ Chap. II]). The Minkowski functional gc of C is a 
semi-norm on W . 

From now, we present the following two properties of our minimal framework (see Theorem 
12. ip that are consequences of the above facts: 

Corollary 5.1. Every finite- dimensional affine subspace W of the t.v.s. V is closed in V and 
is isomorphic to the Euclidean space R" with n = dim VF. Hence S OW is a compact subset of 
W. 

Proof. The compactness of SCiW follows from the remaining parts. Since the topology Tiy) of 

V is invariant under any translation x i— > x + xq, xq G F, we assume without loss of generality 
that ly is a linear subspace of V . Since V is Hausdorff, the assertion W ~ M" follows from 
Theorem I5.lt while the null subspace {0} of V is closed in V ., therefore W = {0} + is also 
closed by Proposition 15. li Hence Corollary [5.11 holds . □ 

Corollary 5.2. Let W be a finite- dimensional affine subspace ofV . Then any affine functional 
f on W extends to a continuous affine functional f on V . 

Proof. Fix an element xq £ W and put a = /(xq). Then the linear functional g : x ^ /(x + 
xq) — a on a finite-dimensional linear subspace W —xq oiV \s continuous by Proposition 15. 2l since 

V is Hausdorff. Moreover, since V is I.e., a consequence of Hahn-Banach's Theorem (Theorem 
ID.ip implies that this g extends to a ^ G CciV). Now the map / defined by /(x) = ^(x — xq) +oi 
is an affine extension of /, and / is continuous since the translation x i— > x — xq is an isomorphism 
from V to itself. Hence Corollarv 15.21 holds. □ 
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5.3 Candidates of conjugate states for Helstrom families 

In this subsection, we investigate the candidates of conjugate states ti, t2 for Helstrom famihes. 
For the purpose, we introduce some further notations. Recall that we have assumed the condi- 
tion dH). In the case pi = P2, let C„cak be the set of all pairs (ti,t2) of distinct ti,t2 & S such 
that the vector tit2 is proportional to S2S1 (i.e., t2 — h = c(si — S2) for some c > 0). On the 
other hand, in the case pi > p2 and s* S, let C^eak be the set of all pairs (ti,t2) of distinct 
ti,t2 G S such that t2 lies in the line segment tis* = Conv({ti, s*}) between ti and s*. Note 
that Cweak / since (s2)'Si) G Cweak- The next lemma shows that Cweak is the set of the pairs 
of conjugate states for weak Helstrom families: 

Lemma 5.2. // {pi,Si;l —pi,ti), i = 1,2, is a weak Helstrom family, then (ti,t2) G C^eak- 
Conversely, if (^1,^2) G Cweak, then there exist < pi < 1, i = 1,2, such that {pi,Si; 1 —pi,ti), 
i = 1,2, is a weak Helstrom family. 

Proof. First, we consider the case pi = P2- Then any weak Helstrom family satisfies pi = P2, 
therefore ^ implies that pi < 1 (otherwise, we have si = s = S2, contradicting the fact si ^ S2) 
and t2 - ti = pi{si - S2)/{1 -pi). Thus (ti,t2) G Cweak- Conversely, if (ti,i2) G Cweak, then 
t2 — ti = c{si — S2) for some c > 0, while this c can be written as c = p/{l — p) with < p < 1. 
Now it follows that [p, Si;l — p,ti), i = 1,2, is a weak Helstrom family. Thus the lemma holds 
in this case. 

Secondly, we consider the case pi > p2 and s* ^ S. Then by ([3]), any weak Helstrom family 
satisfies p2 = P2/P = P1P2/P1 < Pi < therefore 

t2 = ^^^^^^^^^^S^-^ = Xt^ + il-X)s* , whereA = i^ . (6) 
I-P2 I-P2 



Now we have < A < 1 since P2 < Pi ^ 1, therefore t2 G tis*. Moreover, if ti = t2, then ([6]) 
implies that ti = s*, contradicting ti £ S and s* S. Thus (ti,t2) G Cweak- Conversely, if 
(ill ^2) G Cweak, then we have t2 = Xti + (1 — A)s* for some < A < 1, and now {pi, Si; 1 —pi,ti), 
i = 1,2, is a weak Helstrom family for pi = {pi — pi\)/{pi — P2X) and p2 = PiP2/pi- Hence 
Lemma 15.21 holds. □ 



By the lemma and Corollary 14.11 for finding a Helstrom family, it suffices to search a pair 
(^1,^2) G Cweak such that ti and ^2 are distinguishable by a virtual effect e (z £ (see Definition 
14.31 for terminology). The outline to prove the existence of such a pair (ti,t2) is the following: 

1. Define a function i : C'^^^y^ K, where 

C;eak = Cweak U {{t,t) | t G 5} C 5 X 5 , 

such that £ > and £{ti,t2) = if and only if ti = t2; hence ^ > on Cweak- 

2. Prove that C'^,,^]^ is closed in S x S; hence C'^^^]^ is compact since 5 x 5 is. 

3. Prove that i is continuous; hence i takes the maximum value at some pair (ti, t2) in Cweak 
(see the first step). 

4. Prove that ti and t2 are distinguishable. 

From now, we proceed the program. In what follows, for a t.v.s. W, let J~-{W), Cc{W), A{W), 
and >lc(T4^) denote, respectively, the sets of linear functionals on W, of continuous linear func- 
tional on W, of affine functionals on W, and of continuous affine functionals on W. 
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For the first step of the program, we define the function I : C^g^]^ ^ M as follows: In the 
case pi = P2, define i{ti,t2) by 

t2-h= e{h,t2){si - S2) for (ti,t2) G Cak ■ 

On the other hand, in the case pi > p2 and s* 5, define ^(^1,^2) by 

t2 = i{tl,t2)s* + {1 - e{h,t2))h for (tl,t2) eCeak 

(thus < ^ < 1; note that i ^ 1 since ^2 7^ s*). This i has the properties specified in the first 
step. Note that i{ti,t2) becomes larger if and only if ti and ^2 become "far" from each other 
in the space S (in an intuitive sense; this becomes a strict sense at least in finite-dimensional 
cases, since in such a case S admits the Euclidean metric); hence our program to make the value 
£(^1,^2) as large as possible also fits the strategy mentioned in Example 14.11 for decreasing the 
Helstrom ratio. Namely, in the case pi = p2, the definition of i intuitively implies that i{ti,t2) 
is the "distance" between ti and t2 normalized as the "distance" between si and ,52 being 1. 
On the other hand, in the case pi > p2 and s* ^ S, the definition of i implies that 

, , iitl,t2) 

l-i{t„t2)^''-' ^ ' 

therefore £{ti,t2)/{l — £{ti,t2)), that is increasing for i{ti,t2), is the "distance" between ti and 
t2 normalized as the "distance" between t2 and s* being 1. 
For the second step, we have the following result: 

Lemma 5.3. Let {ti,t2) G S x S. 

1. If pi = P2, then we have (ti,t2) £ ^^.g^k ^/ '^''^^ '^'"^^V ^/^^(^i) ^ ^(^2) for any e G f such 
that e{si) < e{s2)- 

2. If pi > p2, then we have (ti,t2) G C^^g^k */ "-'^'^ ^n/y if f{ti) < f{t2) < f{s*) or f{ti) > 
f{t2) > /(s*) for any f G Ac{V) such that f\^ G £. 

Proof. Since the case ti = t2 is trivial, we assume from now that ti 7^ ^2- 
For the first part, if (fi,t2) & C^eak, then Lemma [52] implies that 

s = psi + (1 — p)ti = ps2 + (1 — p}t2 for some < p < 1 and s G 5 

(recall that si 7^ ■S2)- Now for any e G we have 

e(s) = pe{si) + (1 - p}e{ti) = pe{s2) + (1 - P)e{t2) , 

therefore e(ti) > 6(^2) whenever e(si) < e{s2)- On the other hand, if (ti,t2) C^cak' then we 
have either (^2,^1) G Cwgak, or tit2 = t2 — ti is not parallel to the line Aff({si,S2}) containing 
si and 52- In the former case, we have e(si) < e(s2) for some e G £" since S is separated (note 
that 1 — e G iS and 1 — e(si) < 1 — e{s2) whenever e G £' and e(si) > e(e2)), therefore we have 
6(^2) > e(ti) in the same way as above. In the latter case, it is easy to show that /(si) = /(S2) 
and f{ti) < f{t2) for an affine functional / on the affine hull of {si, S2, ti, ^2}) and Corollary 
15.21 implies that this / extends to an / G Ac{V). Now f{S) is bounded in M since S is compact. 
Thus by taking a > and /3 G M appropriately, the continuous affine functional g = af + /3 
on V satisfies that g{si) = 5(52), 5(ii) < 5(^2) and g{S) C [0, 1], therefore e = g\^ is a virtual 
effect satisfying e(si) = e{s2) and e{ti) < e{t2)- Thus the first part of Lemma [531 holds. 

For the second part, note that s* 5 by the assumption The "only if" part is now trivial 
by the definition of C„gak- To prove the "if" part, assume that (ti,t2) ^weak- Then ti 7^ t2, 
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and we have either ti G t2S*, or tit2 is not parallel to the line Aff({ti, s*}) (note that s* tit2 
since s* ^ S). In the former case, since V is HausdorfF, there exists an / G CciV) such that 
7(^2) < fiti)- Now since S is compact, an appropriate transformation g = af + (3 with q, /? G R 
satisfies that g{S) C [0,1] (hence i^l^ G f) and g(t2) < g{ti), therefore g{ti) < g{s*) since 
t G t2S* and ti 7^ s*. On the other hand, in the latter case, we have /(ti) = /(s*) < 7(^2) for an 
affine functional / on the affine hull of {ti, ^2, s*}, and Corollary 15.21 implies that this / extends 
to an / G Ac(y). Now f{S) is bounded in R since <S is compact. Thus by taking an appropriate 
affine transformation of / in the same way as above, it follows that g{ti) = g{s*) < g{t2) for a 
g G Ac{V) such that g\^ G £. Hence the second part of Lemma [STSl holds, concluding the proof 
of Lemma 15.31 □ 



By this lemma, C^g^k dosed in 5 XiS as desired, since the virtual effect e G <5 corresponding 
to each e G f is continuous on S. 

For the third step, we have the following result: 

Lemma 5.4. The function £ on CyjQ^y^ is continuous. 

Proof. First, we consider the case pi = p2. Fix e G <S such that e(si) > e{s2) (this is possible 
since S is separated), and put c = (e(si) — e{s2))~^ > 0. For any (ti,t2) £ C^cak' Lemma E2] 
implies that there exists a p G [0, 1) such that psi + (1 — p)ti = ps2 + (1 — p)t2 G S. Now we 
have i{ti,t2) =p/{l—p) andpe(si) + (1 — p)e{ti) =pe{s2) + (1 —p)e{t2), therefore 



i{ti,t2) = f'l = c(e(t2) - e(ti)) 



e(si) - e[S2) 

This implies that £ is continuous, since e G f is continuous. 

Secondly, we consider the case that pi > p2 and s* S. Let be the set of all / G Ac{V) 
such that /l^ G Now for each f & J^, put 

Af = {{ti,t2) G 5 X 5 I /(ti) / fis*)} cSxS 
and define a function gf : Af —i- Why 

5/(ti,t2) = ||||^^for(ti,t2)GA/ . 

Since / is continuous, Af is open in 5 x 5 and gf is continuous. Moreover, we have i{ti,t2) = 
5/(^1) ^2) for any (ti,t2) G C^cak ^/ definition of i. Now we show that 

r'^{U) = IJ (C;,g^k n 9f'^{U)) for any open subset [/ C R . 

Once this is proven, i^^{U) is open in C^g^k since each gf^^(U) C is an open subset of 
S X S (recall that Af is open in 5 x 5), therefore the continuity of i follows. Since i and gf 
agree on C'^^^]^ (1 Af as above, the inclusion D holds immediately. For the other inclusion, let 
iti,t2) G C4,g^k such that £{ti,t2) G U. Let W denote the line Aff ({si, S2}). Now if ti TV, then 
an argument similar to Lemma 15.31 (based on Corollary 15. 2|] implies existence of an / G such 
that / is constant on W and f{ti) ^ f{si), hence /(s*) = /(si) / f{ti) (note that s*_G VF). 
On the other hand, suppose that ti G W. Since V is Hausdorff, there exists an / G Ac{V) such 
that /(si) 7^ /(•S2)- Now by a similar argument as above, this / can be chosen from J^. Since 
the four points si, S2, s*, and ti are all collinear and ti ^ s*, the fact /(si) 7^ /(■S2) implies 
that /(s*) 7^ /(^i)- Thus (ti,t2) G Af in any case, while £ and 5/ agree on C^^i^^^t^Af, therefore 
5/(^1)^2) = ^(^1,^2) G [/ by the above argument. Hence the inclusion C follows, therefore 
Lemma 15.41 holds. □ 
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For the final part, let C be the subset of C'^^^]^ that consists of all pairs in C^g^k which i 
takes the maximum value: 



C = {iti,t2) G I ^(ti,t2) = ^ , max eit[,t'^)} . 

Note that 7^ C C C„eak by the above argument. From now, we show that for any (ti, ^ C„eak! 
ti and t2 are distinguishable if and only if (ti,t2) G C; in particular, a Helstrom family exists. 
First, one direction of this assertion is proven as follows: 

Proposition 5.4. //(ti,t2) G Cweak; and ti and t2 are distinguishable, then (ii,t2) £ C. Hence 
the pair of conjugate states ^1,^2 in any Helstrom family belongs to C. 

Proof. The latter part is derived from the combination of the former part. Lemma 15.21 Lemma 
15. H and Corollary 14.21 To prove the former part, assume contrary that ti and t2 in S are 
distinguishable by a virtual effect e G £ and (^1,^2) S C^eak but £(^1,^2) < ^(*'ii*2) some 
{t[,t2) G Cweak- Since Afr(5) = V, this e extends to an / G A{V). Let W = AS{{ti,t2,t[,t'2}). 
Then by Corollarv l5.H W is isomorphic to a Euclidean space W" with n = dim W and S' = SDW 
is a compact convex subset of W. Since (^1,^2), ^2) G Cweak, we have n < 2 by the definition 
of Cweak- Now Hi = W Ci f~^{l) and H2 = W (1 f~^{0) are parallel supporting hyperplanes of 5' 
in W at ti and at t2, respectively, and S' lies between Hi and i?2- Note that ti,t2,t'i,t2 G <S'. 

Now in the case pi = p2, t'it2 is parallel to tit2 since (fi, ^2), (i'l, ^2) G Cweak- Thus it is 
geometrically obvious that \t'it2\ < \tit2\ (where \xy\ denotes the distance between x and y in 
the Euclidean metric on M"'), since two intersecting points of the line Aff ({t'j^, tg}) with Hi and 
with H2, respectively, and ti and t2 form a parallelogram (see Fig. [2ja)). This contradicts the 
assumption i{t[,t2) > i{ti,t2). 

On the other hand, we consider the case that pi > p2 and s* ^ S. Note that s* £ W since 
{ti,t2) G Cweak- Then the assumption i{ti,t2) < ^(^1,^2) implies that |i2i'il/|'S*t2| > |i2ii|/|'S*t2|; 
in particular, neither t[ nor t2 lies on the line segment tii2- Let vi and V2 be the intersecting 
points of the line AS{{t[,t2}) with Hi and with ^^2) respectively (see Fig. [5]^b)). Then we have 

\v2Vl\ ^ 1^2^11 ^ |^2^l| 
\s*V2\ ~ \s*t2\ \s*t2\ 

However, since Hi and H2 are parallel, two triangles As*viti and As*V2t2 are similar, therefore 
we have |w2't^i|/|s*'y2| = |i2^i|/|'5*t2|! a contradiction. 

Thus a contradiction occurs in both cases. Hence Proposition 15.41 holds. □ 




Figure 2: The cases (a) pi = p2 and (b) pi > p2, s* 5 in Proposition 15.41 

Now we are in a position to state our main theorem in this article, that will be proven in 
the next subsection: 
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Theorem 5.3. //(ti,t2) £ C, thenti andt2 are distinguishable. Hence, by the above argument, 
a Helstrom family always exists under the assumption ^ . 

Before starting the proof we notice the following: Once Theorem 1 5. 3 1 is proven, the hypothesis 
"and a Helstrom family exists" in the first part of Lemma [5.11 b ecomes redundant, therefore the 
following simple criterion for generic cases in two-state discrimination problems will be obtained 
that improves Lemma l5. II 

Theorem 5.4. Under the assumption pi > p2, the condition is necessary and sufficient for 
the case to be generic. 

In particular, an equiprobable case pi = p2 = 1/2 is always a generic case, therefore in such 
a case we can always make a correct guess with probability strictly higher than 1/2 by using an 
appropriate observable. 

We also mention another nontrivial consequence of Theorem 15.31 that shows interesting rela- 
tions between optimal success probabilities for equiprobable two-state discrimination problems 
and Gudder's metric functions on the state space (cf.. Remark 12. 3p : 

Remark 5.1. First, we translate the definition of Gudder's metric function [9] on compact state 
spaces to our framework with not necessarily compact real state space iS. For s'^, s'2 € 5, define 
d{s'i, S2) to be the infimum of < A < 1/2 such that 

Ati + (1 - A)s'i = \t2 + (1 - A)4 for some ti,t2 eS 

(note that A = 1/2, ti = S2 ^2 = •s'l always satisfy this condition). This function d is a 
metric on S, and this definition coincides with Gudder's original definition in the case S = S 
(i.e., when S is compact). Now the above condition is equivalent to that (1 — A, s^; A, ti), i = 1, 2, 
is a weak Helstrom family for states s'^,S2 and a priori probabilities pi = 1/2, with Helstrom 
ratio given hy p = 1/(2 — 2A). Thus minimizing A is equivalent to minimizing p, and Theorem 
15.31 implies that the infimum ^(s'j^jSg) of such A is attained by some Helstrom family, with 
Helstrom ratio p = Psucds'i, S2) where -Psucc(s'i, ^2) denotes the optimal success probability for 
discriminating s'l and S2 in the equiprobable case. Thus we have a nontrivial relation 

d{s[,S2) = 1 - — ] , , for any s[, 82^3 . (7) 

In particular, it follows that the function of s'i,s'2 in the right-hand side is a metric on S. It 
seems infeasible to derive the fact just from the intuitive meaning of "optimal success probability 
of state discrimination". 

On the other hand, Gudder also defined another metric function on the same state space, 
called the "intrinsic metric" , by using the former metric function d as a building block. Accord- 
ing to Gudder's definition, we put 

The concrete structure of the above metric d implies that d is indeed a metric function and 
< d < 1. Moreover, it follows from ([7]) that 

d{s[ , s'2 ) = 2P,ucc (s'l , s'2 ) - 1 for s'l , s'2 G 5 . (8) 

This shows an operational meaning of Gudder's intrinsic metric that has not been pointed 
out in the literature. Moreover, by comparing ([8]) to the well-known formula i'succ(pii P2) = 
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l/2+Z)(pi, /92)/2 for quantum states pi, P2, where D{pi, P2) denotes the trace distance, Gudder's 
intrinsic metric coincides with the trace distance for quantum cases. Hence we have obtained 
an operationally natural generalization of the trace distance to general probabilistic theories. 

Moreover, it is in fact possible to define the "trace distance" in general probabilistic theories 
directly through the classical trace distance: 

D{s[,S2)= sup Dciei{s[),ei{s2)) iov s[,S2 e S , (9) 

where Dc{pi,qi) denotes the classical trace distance (Li distance or Kolmogorov distance) [15] 
between probability distributions pi and Qi: 

Dc{pi,qi) = ^J2^P^~ ' 

i 

and O = Uatgn denotes the set of all discrete observables. (Note that the argument below 
shows that the supremum in ([9]) is always attained by some observable and it can be chosen 
from two- valued observables.) Since the classical trace distance is the maximal difference of 
probabilities between pi and qi among all events S, i.e., Dc{pi,qi) = max^ |p(5) — q{S)\ = 
max5 I YliesPi ~ Ylies1i\^ considered as an operationally natural distance between prob- 
ability distributions. In order to distinguish states s'^ and S2 in general probabilistic theories, 
what one can do best is to find the best observable O = (ej)i G O for catching the difference 
between s'l and S2 by comparing the probability distributions ei{si) and 6^(82). Thus we are 
lead to the definition ([9]) of the distance between states; namely, D{si,s'2) has the same oper- 
ational meaning as Kolmogorov distance that is optimal among all observables. From now, we 
show that Gudder's intrinsic metric ([8]) is in fact the same as our trace distance ([9]). 

For the purpose, first we show that in our trace distance, it suffices to consider just two- 
valued observables O = (ej)i E O2, namely: 

Z)(s;,4)= sup DMs[),ei{s'2)) . (10) 

0={ei),602 

(Now the supremum is attained by some observable due to the compactness of O2 and the 
continuity of Dc{ei{s'i), ei{s2)); see the proof of Theorem 13. 11 ) To prove p^ . note that one can 
associate to any O = (ei)j € O a two- valued observable {e'_^_, e'_) G O2 with e'_^_ = X]jgjv^_|_ and 
e'_ = 1 — e+, where M+ = {i \ ei{s'i) > 61(52)}. By the definition, we have Dc(ei(s'^), 64(52)) = 
Dc{e'j.{s'^), 6^.(52)). This implies that the right-hand side of pO|) is greater than or equal to the 
right-hand side of Q, while the opposite inequality holds obviously (since O2 C O). Hence (fTO]l 
holds. Note that this argument also provides another simple expression of our trace distance 
D{s[,s'2): 

Z?(s;,4) =sup[e(s'i)-e(4)] , (11) 

where the supremum is again attained by some effect due to the compactness of £ (see the proof 
of Theorem 13. ip . 

Now it is not difficult to see that Gudder's intrinsic metric ([8]) is indeed the same as our 
trace distance ([9]): To see this, just observe that for s'i,S2 G S with a priori probabilities 
Pi = P2 = 1/2, we have from ([T|) and ^ 

Psuccis'i, S2) = ^(1 + sup [e(s'i) - 6(4)]) • 

Substituting it into ([8]) and using ([TT]) . we obtain the desired relation: 

d{s[,s'2) = D{s[,s'2) . (12) 
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The equivalence (|12|) provides simple proofs for several properties of d(s'^,s'2) originally 
shown by Gudder [9]. For instance, since the classical trace distance Dc{pi, Qi) is well known to 
be a metric, so is our trace distance D{s[,S2) by the definition, therefore d{s[,S2) is indeed a 
metric as well (for positiveness of D{s'i, s'2) with s'^ 7^ s'2 we needed the fact that the state space 
is separated). We also consider another important property, the monotonicity of d(s'^,S2)- 

Theorem 5.5 (Gudder [9j). For any state Si,S2 £ S and any affine map F : S ^ S, we have 

diFis[),Fis'2))<dis[,s'2) . 

Now this fact is an easy consequence of the equivalence (I12p and the fact that affine maps 
are closed under composition. Namely, for any observable {ei)i € O, by putting fi = eioF: 
5 — > [0, 1] we have 

D,{ei{F{s[)), ei{F{s'2))) = D,{Ms[), Ms'2)) . (13) 

Now {fi)i is also an observable, therefore the supremum of the left-hand side of (|13p over 
{ei)i G O does not exceed the supremum of the right-hand side of (fT3]) over all observables 
{fi)i. This implies the monotonicity of D{s[, s'2), hence of d{s[, Sg)- (We note that the quantity 
in the right-hand side of (jlip was also investigated in [5] in slightly different context; for instance, 
it was shown to be a metric, and the monotonicity was also proven there.) 

Summarizing, we have shown that Gudder's intrinsic metric has two operational meanings; 
one is directly given through the classical trace distance (jl2p : another is given by the optimal 
success probability to discriminate two states under a uniform distribution ([8]). 

Remark 5.2. As an application of Gudder's intrinsic metric, or the trace distance defined above, 
we have a simple (qualitative) version of information disturbance theorem in general probabilis- 
tic theories. Before giving the theorem, we clarify the meaning of some terminology. We say 
that a state s is a pure state if s is an extremal point of the state space. We say that two states 
are indistinguishable if these are not distinguishable in the sense of Definition 14.31 Then the 
above-mentioned theorem is the following: 

Theorem 5.6. In any general probabilistic theory, any attempt to distinguish two indistinguish- 
able pure states causes a disturbance. 

This theorem is a generalization of the well-known corresponding theorem in quantum theory 
(see e.g., Proposition 12.18 in [15]) to arbitrary general probabilistic theories. It is known that 
a general probabilistic theory is non-classical if and only if there exist indistinguishable pure 
states [1]. Hence one can conclude that the information disturbance property inevitably holds 
for any non-classical general probabilistic theory, not only for quantum theory. 

Before presenting the proof, notice that any dynamics on S should be described by an affine 
map F : 5 — > iS in order to preserve the probabilistic mixture, while the composition of state 
spaces Si and ^2 is given by a tensor product Si (81 ^2 (see [T] and references therein) . 

Theorem \5.(A Let si,S2 € 5 be two indistinguishable pure states (thus 1 > -Psucc(si, S2)). Let 
Si ® sq {i = 1, 2) be the initial states 0x1 S ® S', where sq € S' is any fixed state to which the 
information of si or S2 is transferred. Assume contrary that one can extract information with 
which one distinguishes si and S2 without causing any disturbance. More precisely, we assume 
that there exists an information transfer machine described by an affine map F : S^S' S0S' 
such that the reduced states of F(si (8) sq) to the first system S remains to be Sj (i.e., causing 
no disturbance) while the reduced states of F{si (Si sq) and F{s2 ® so) to the second system S' 
are distinct (i.e., enabling one to extract some information to distinguish si and S2). Now it is 
easy to show that if a reduced state is in pure state, then the whole state should be a product 
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state by showing that there exist no correlations between an arbitrary pair of observables (or 
effects). Therefore, we have 

F{si(g) So) = Si^ti , F{S2^ So) = S2^t2 , 

with ti 7^ ^2 € 5'. Using the machine F N times, one obtains an affine transformation F on 
S 5'®^ such that 

F{si®{s^^)) = s^(^tf'' . 

Physically, this means that one obtains an arbitrary large number of ensembles for (distinct) 
state ti or t2, and thereby can distinguish them with success probability arbitrarily close to 1. 
In other words, the optimal success probability to distinguish F{si (g) (s^^)) and F{si (g) (sf^)) 
can be exponentially close to 1 with respect to N (to see this formally, use Chernoff bound [7] 
for instance). On the other hand, we have 

1 > Psucc(si,S2) = Psnccisi ® S^^,S2 sf"") > Psncc{F{si sf^),F(s2 ® sf^)) 

for any A^, where the last inequality follows from Theorem 15.51 and ([8]). This is a contradiction, 
since the last term converges to 1 when ^ oo as mentioned above. Hence the proof of 
Theorem 15.61 is concluded. □ 



5.4 Proof of Theorem 15.31 

In this subsection, we give a proof of Theorem 15.31 namely we prove that ti and t2 in S are 
distinguishable if (^1,^2) £ C (see Definition 14.31 for terminology). 

First, we would like to reduce our argument to the special case t2 = —ti. For the purpose, 
let vo = {ti + 12)/2 £ S and put C = S — vo, that is also a convex subset of V. Moreover, put 
ti = ti — Vo and ^2 = *2 — vq. Then we have ti,t2 £ C and t2 = — ti- Note that ti / t2 since 

tl^t2. 

The outline of our proof is the following. First, note that the existence of an e G such 
that e{ti) = 1 and e{t2) = (that is nothing but our goal) is obvious if V coincides with 
the 1-dimensional linear subspace W spanned by ti (hence by t2). To construct such an e in 
more general case, we would like to extend a nonzero linear functional / on W' (note that / is 
continuous on W' and f{Cr\W') is bounded in M) to a continuous linear functional fonV such 
that /(C) is bounded in M. Then it will be shown that the restriction of an appropriate affine 
transformation h = af + /5 of / (a, /? G M) to 5 is the desired virtual effect e. To construct 
such an extension / of /, first we use Theorem 15.21 to obtain an extension f oi f to W = V 
(not yet necessarily continuous) such that f'{C) is bounded in M, and then we further modify 
the functional /' by using Theorem ID. II to obtain /. 

To perform the program, we start with the linear functional / on the 1-dimensional subspace 
W' such that /(Ati) = A for each A € M, therefore f{ti) = 1 and f{t2) = —1. To apply Theorem 
15.21 we would like to take an appropriate semi-norm g on V, more precisely, the Minkowski 
functional of a certain subset C of F (see Proposition 15. Sp . From now, we define the subset 
C. Note that the convex subset C of V contains the origin of V, therefore we have \x € C 
for any x G C and < A < L Thus the subset ±C = C U -C of V is circled (see Sect. lOl 
for terminology). Now define C to be the convex hull Conv(ibC) of itC, which is also a circled 
subset of V. By the convexity of C, any element v of C can be written as v = Xx — X'x' with 
X, x' G C, A, A' > and A + A' = 1. This subset C has the following property: 

Lemma 5.5. C is a radial subset ofV (see Sect. \ 5.^ for terminology). 
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Proof. Let Wq be the set of a\\ v (z V such that v € AC for some A > 0. Then Wq contains C, 
hence C. Moreover, if v Wq, Aq > and v £ XqC, then we have v £ AC whenever |A| > Aq 
since C is circled. Thus C is radial if Wq = V. To prove Wq = V, it suffices to show that Wq 
is a linear subspace of V. Indeed, once this is proven, Wq + vq will be an affine subspace of 
V containing S (recall that Wq D C = S — vq), therefore Wq + vq = V (hence Wq = V) since 
Aff(5) = V. 

Let vi,V2 € Wq. Then for each i, we have Vi £ XiXi for some A, > and Xi € C. Moreover, 
let /xi,/i2 S M, and write = ejZ/j with G {if} and f « > for each i. We show that 
/ii^i + fi2V2 £ Wq; since this is obvious when fii = ^2 = 0, we assume from now that z^i > or 
f2 > 0. Then by putting x'^ = SiXi £ C for each i (note that C is circled), we have 

^ivi + //2U2 = XiViXi + A2J^2a::2 = (-^i^i + ^21^2) — r — ; , 

AiVi + A2l^2 

therefore fiivi + fi2V2 £ (Aiz^i + A2J^2)C by the convexity of C. Hence we have nivi + /i2f2 G Wqi 
therefore Lemma 15.51 holds. □ 



Owing to the above properties of C, we define the semi-norm g to be the Minkowski func- 
tional g^, of C (see Proposition 15.30 . Note that g{v) < 1 for any t> € C by the definition of 

Prom now, to apply Theorem 15.21 we show that |/(f)| < g{v) for any v £ W' . Since W' is 
1-dimensional and g is a semi-norm, it suffices to show that g{ti) = 1 = f{ti). This is proven 
in the following lemma: 

Lemma 5.6. We have g{ti) = 1. 

Proof. Pirst, note that g{ti) < 1 since ti £ C. We show that g{ti) > 1, or equivalently, there 
does not exist an element v £ C and c > 1 such that v = cti. Assume contrary that such a pair 
(f , c) exists. As mentioned before, this v is of the form v = \x — {1 — \)x' with x,x' £ C and 
< A < 1, therefore Ax — (1 — A)x' = cti = — ct2- Moreover, by the definition of C, we have 
X = s — Vq and x' = s' — vq for some s,s' £ S, therefore 

v = Xs-{1- X)s' + (1 - 2X)vo = ch = -ch ■ 

Note also that t2 — ti = t2 — ti = 2t2 = —2ti. Prom now, we show that we can construct a 
pair (i'i,t2) ^ ^wcak (^^ using the convexity of S) such that ^(^'1,^2) > ^(^i)*2), contradicting 
the assumption (ti,t2) S C. 

Pirst we consider the case that pi = p2. li X < 1/2, then we have 

(1 - A)s' - As - (1 - 2A)ti = -v-{l- 2X)h = (c + 1 - 2A)t; , 

therefore s' — s" = a{t2 — ti), where s" = (As + (1 — 2A)ti)/(l — X) £ S (note that S is convex) 
and a = (c + 1 — 2A)/(2 — 2A). Since c > 1, we have a > 1, therefore (s",s') £ C'^^^]^ and 
£{s",s') = a£{ti,t2) > i{ti,t2), as desired. Similarly, if A > 1/2, then we have 

(2A - l)t2 + (1 - A)s' -Xs = -v + (2A - l)h = (c + 2A - l)h , 

therefore s"-s = a{t2-ti) where s" = {2-X-^)t2+{X-^-l)s' £ 5 and a = (c+2A-l)/(2A) > 1. 
Thus we have (s,s") £ C^^^k ^(■5,-s") > ^(^1,^2)5 as desired. 

Secondly, we consider the case that pi > p2 and s* S. Put £ = £{ti,t2) for simplicity. 
Note that < £ < 1 and 

£s* = t2-{l- £)h = 2vo - (2 - £)h = (2 - £)t2 - (2 - 2£)vo , 



22 



while 

V = Xs — {1 — X)s' + (1 — 2X)vq = cti — CVq = CVq — Ct2 ■ 
Put /X = (2 — £){2X — 1) + ci. If > 0, then the above relations imply that 

A(2 - 2i)s + i{c + 2X- l)s* = (1 - A)(2 - 2£)s' + /ut2 • 

Now the coefficients of s, s*, s', and t2 in this equality are all nonnegative, and the sums of the 
two coefficients in the left-hand side and in the right-hand side, respectively, are positive and 
equal to each other; namely, 

A(2 - 2^) + ^(c + 2A - 1) = (1 - A)(2 - 2^) + /i = c£ + 2A - ^ > . 

Thus by the convexity of S, we have (1 — a)s + as* = s" for some s" € S, where 

i{c + 2>^ ^ 2Xil-ll 
ce + 2X-£ ci + 2X-£^'^ 

(note that < ^ < 1 and c > 1). Thus we have (s, s") G C^eak ^(*' = o > £, as desired. 
Similarly, if ^ < 0, then we have 

2As + \fi\ti + £{c+l- 2X)s* = (2 - 2A)s' . 

Since c > 1, all the four coefficients in this equality are nonnegative, and the sum of the three 
coefficients in the left-hand side is equal to the coefficient 2 — 2A > in the right-hand side; 
namely, 

2A + + ^(c + 1 - 2A) = 2 - 2A > . 

Thus by the convexity of <S, we have (1 — a)s" + as* = s' for some s" S S, where a = 
£{c+l- 2A)/(2 - 2A) G {£, 1] (note that ^ > and c> 1). Thus we have (s", s') G C'^^^^ and 
£{s" , s') = a > £, as desired. 

Hence our claim holds in all cases, therefore Lemma 15.61 holds. □ 



Thus by Theorem 15.21 the functional / on W extends to an /' G C{V) such that |/'(f)| < 
g{v) for any v €V. Since f'\w' = /; we have /'(ti) = 1, /'(i2) = —1 and |/'(x)| < g{x) < 1 for 
any a; G C, therefore /'(C) C [—1,1]. By putting a = f'{vo), it follows that 

f'{ti) = a + l , f'{t2) = a-l , /'(5) C [a-l,a + l] , 

therefore the restriction of /' to V is continuous. Our desired virtual effect e can be constructed 
directly from this /' if /' is also continuous on V; however, this is not guaranteed in general. 

Thus, instead, by using Theorem ID. 11 we take a continuous linear functional f on V such 
that f\v = f'\v- Note that f{S) C [a — 1, a + 1] since S C S (IV, therefore we have f{S) C 
[q — l,a + 1] since S = cly{S). From now, we show that /(ti) = a + 1 and /(t2) = a — 1. 
First, we consider the case pi = p2- Then we have ii — t2 = c{s2 — si) with c = £{ti,t2) > 0, 
while S2 — si £ V since si,S2 G S, therefore 

7(t2 - h) = cJisi - S2) = Cf'{si - S2) = f'{t2 - h) = -2 . 

Since f{S) C [a — l,a + 1] as mentioned above, we have 

a - 1 < 7(^2) = 7(ti) - 2<a + l- 2 = a- l , 
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therefore /(t2) = a — 1 and f{ti) = f{t2) + 2 = a + 1. Secondly, we consider the case pi > p2 
and s* S. Now t2 = cs* + (1 - c)ti with < c = ^2) < 1, while s* € therefore 

7(t2) - (1 - C)7(tl) = CJ{S*) = Cf'is*) = f'ih) - (1 - c)/'(tl) . 

Now we have /(ti) < a + 1 = f'{ti), therefore 7(^2) < /'(*2) = a — 1 since 1 — c > 0. Thus 
we have /(t2) = a — 1 since /(t2) > a — 1, therefore /(ti) = /'(^i) = a + 1. Hence we have 
f{ti) = a + 1 and /(t2) = a — 1 in any case. 

Finally, by the above properties, the affine functional h = {f + 1 — a) /2 on V is continuous 
and satisfies that h{ti) = 1, h(t2) = and h(S) C [0, 1]. This implies that e = h\g is a virtual 
effect that distinguishes ti and ^2- 

Hence the proof of Theorem 15.31 is concluded. 
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Appendix: Proof of Theorem 12.1 

In the appendix, we give a proof of Theorem 12. li In what follows. For any convex structure C, 
let Ac:{C) be the set of all / G A{C) bounded on a subset C of C Moreover, for any convex 
subset C of a t.v.s., let Ac{C) denote the set of all continuous / € A{C). 

A Construction of S and V 

First, we describe construction of a vector space V and its convex subset S such that S is 
isomorphic to the separated convex structure So and V = Aff(5). Here we abuse the notations 
S and V though these S and V are in fact not necessarily the same as (but isomorphic to) S 
and V in Theorem [Til respectively. Although our argument is essentially the standard one (cf., 
[D El El [El El US]), we give the argument here for the sake of completeness. 

Our argument is the following. In what follows, let C{W) denote the set of all linear 
functional on a vector space W; and for any convex structure C, let A{C) denote the set of 
all affine functionals on C. Then the set A{Sq) forms a vector space with natural addition and 
scalar multiplication, therefore its dual space A{So)* = C{A{So)) is also a vector space. We 
define an "evaluation map" ev^ : A{So) M for each s £ Sq hy evsif) = fis) for / G A{So)- 
Then a straightforward argument shows that evg G ^(5o)* for every s € Sq, and the map 
■0 : 5o — > A{So)*, ip{s) = evs, is a homomorphism of convex structures, i.e., ip{{X, fi; s,t)) = 
Xip{s) + ^ip{t) for any s,t G Sq. The fact that Sq is separated (Lemma 12. ip implies that "0 is 
injective. Moreover, by fixing an element v G ^'('^o), the map (/? : 5o — > A{SqY , ip{s) = ip{s) — v, 
is also an injective homomorphism of convex structures. Thus S = <p{Sq) is a convex subset of 
the vector space A{Sq)* containing the origin of A{So)* . Now V = Aff(5) is a linear subspace 
of A{Sq)* . Thus S and V are obtained. 

B Topologies on S and V 

Secondly, we give the definition of topologies on V and S. In what follows, for any vector space 
W, let Cq{W) denote the set of all / G J0{W) bounded on a given subset C of W. For any 
t.v.s. W, let CciW) denote the set of all continuous / G ^(W). For a convex subset C of a 
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vector space W and a subset of A{C), let a{C, !F) denote the weakest topology on C to make 
every f & T continuous. For a topology T on a space X and a subset Y of X, let T|y denote 
the relative topology on Y induced by T. For two topologies T and T on the same set X, we 
write T C T' to signify that T is stronger than or equal to T (i.e., every T-open subset of X 
is T'-open). Moreover, let E denote the set of all e G A{S) such that e(5) C [0, 1]. 
Now we define the topology T(y^ on V by 

T(y) = a(y,4(y)) . 



This topology makes V a l.c.t.v.s. (see e.g., 
the following property: 

Lemma B.l. The t.v.s. V is Hausdorff. 



T8l Chap. II, Sect. 5]). Moreover, this V satisfies 



Proof. First, since S is convex, an elementary argument shows that the affine hull Aff(5) = V 
of S consists of all elements of the form As — X's' with s, s' G 5, A > 1 and A — A' = 1. Let 
V = Xs — X's' and v' = — ^'t' be distinct elements of V written in the above form. Now put 

A - 1 n-1 1 

^~ A + ;U-1 ' X + fi-1 ' X + fi-1 ' 

therefore p,q > 0, r > and p + q + r = 1. Moreover, put 

w = ps' + qt' + rv , w' = ps' + qt' + rv' . 

Then w ^ w' since v ^ v' and r > 0, while we have 

w = rXs + {jp - rX')s' + qt' = (1 - q)s + qt' e S 

since S is convex, and similarly w' G S. Since <S ~ 5o is separated by Lemma l2.lt there exists 
an e G £^ such that e{w) ^ e{w'). Now by the definitions of w and w' , the affine extension / of 
etoV satisfies / G £|(y) and f{v) / /(V). Thus V is Hausdorff with respect to a{V,Csi^))- 
Hence Lemma IB . 1 1 holds . □ 

On the other hand, the induced topology on S satisfies the following: 

Lemma B.2. Two topologies T(V)\s and a{S,£) on S coincide. 

Proof. In the proof, put T = T{V) = a{V, Cg{V)). First, we show that each e G f is 
continuous. Since Aff(5) = V, this e extends to an affine functional / on y such that f{S) is 
bounded, therefore / + a G 'C^(y) for some a G M. Thus / + a is T-continuous by the definition 
of T, therefore / is also T-continuous and e = f\s is (T|5)-continuous as desired. This implies 
that a{S,£) C T\s. 

Now it suffices to show that each (T|5)-open subset U of S is a{S, £)-open. Take a T-open 
subset U' of V such that U = U' CiS. Then for each s £ U C U' , by the definition of T, there 
exist a finite number of fi G C^{V) and the same number of open subsets Wj C M such that 
s G Clifr^i^i) C U'. Since s G 5, we have s G Cl^iS n ff^iWi)) C U, therefore it suffices to 
show that each subset S D f^^{Wi) of S is a{S, £)-open. Since fi{S) is bounded, there exist 
ai,Pi G M such that 7^ and the functional gi = aifi + Pi satisfies gi{S) C [0, 1], therefore 
= gi\s ^ £■ Moreover, we have f^^{Wi) = g~^{aiWi + Pi) and Wl = aiWi + Pi is also an 
open subset of M. Thus S n f^^{Wi) = Sn ^^^l^O = ^i^i^D' t^at is a{S, £)-open by the 
definition of cr (5, £"). Hence Lemma IB^ holds. □ 
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C The Completions of S and V 



To proceed the proof of Theorem 12.11 further, we recah the following notion: The completion 
of a uniform space X is a complete uniform space X such that X is a dense subspace of X. 
(See e.g., [5, Chap. II] or [18j for properties of uniform spaces). The completion X of such a 
space X always exists, and X is Hausdorff if and only if X is Hausdorff. Since any t.v.s. is a 
uniform space (see e.g.. Proposition 1.4 in [iHl Chap. I]), the completion V of the Hausdorff 
t.v.s. V exists in the above sense. Moreover, this V also admits a structure of a t.v.s., and now 
y is a complete Hausdorff t.v.s. and F is a topological vector subspace of V (with the induced 
topology equal to (j{V, C^{V))) that is dense in V (see e.g.. Proposition 1.5 in [181 Chap. I]). 
Here we use the conventional notation V for the completion of V, though it is not necessarily 
the same as (but is closely related to) the V in Theorem 12. li 

Since S is convex, the closure S = cl^(5) of 5 in y is also convex in V (see e.g., Proposition 
1.2 in [18\ Chap. II]). Again, note that this S does not necessarily coincide with (but is closely 
related to) the S in Theorem 12.11 Now the closed subset S of the complete t.v.s. V is also 
complete (as a uniform subspace) , therefore S is the completion of S (as a uniform subspace of 
V) since S is dense in S. We would like to show that S is compact; we give a lemma for the 
purpose. Here we use the following terminology. A subset S of a t.v.s. W is called bounded if 
for any 0-neighborhood (i.e., neighborhood of the origin) U of W, there exists a A € M such 
that B C \U. Then we have the following: 

Lemma C.l. The convex subset SofVis bounded in V. 

Proof. By the definition of the topology on V, each 0-neighborhood U V contains an open 
0-neighborhood of the form /i~^(C^i ) with finitely many fi G /:|(V) and the same number of 
open subsets Ul of M containing 0. Since each fi{S) C M is bounded, there is a A > such that 
fi{S) C \U- for every i. Thus S C ^fi~^{U-) for every i, therefore <S C XU. Hence the lemma 
holds. □ 

Now note that the topology T(V) = cj(F, £^(F)) of ^ is a weak topology, i.e., it coincides 
with a{V, Cc{V)) where continuity of each / € ^dV) is with respect to T{V) (namely, every 
member of £^(y) is continuous with respect to a{V, Cc{V)) and every member of Cc{V) is 
continuous with respect to T{V)). Since 5 C F is bounded by Lemma \C1\ and V is I.e., it 
follows that S is precompact, i.e., the completion 5 of <S is compact (see e.g.. Corollary 2 of 
Proposition 5.5 in Chapter IV]). The current situation is summarized as follows: 

• 5 ~ 5o is a convex subset of a I.e. Hausdorff t.v.s. V containing the origin, with Aff (5) = 
V, such that the induced topology on S is a{S,£); 

• the topology T{V) of V is a{V,C''g{V)) = a{V,Cc{V))] 

• V is a complete Hausdorff t.v.s. containing F as a dense topological vector subspace; 

• S = dy{S) is the completion of S that is compact and convex. 

D Existence of the Objects in Theorem 12.1 

From now, we modify the above objects to obtain the objects in Theorem 12.11 In what follows, 
for a t.v.s. W, let a{W) denote the weak topology a{W, Cc{W)) on W. The following facts will 
be used in our argument: 
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Proposition D.l (Corollary 2 of Theorem 4.1 in [TSl Chap. IV]). Let W be a l.c.t.v.s. with 
topology T, W' a vector subspace of W , and W = W/W' the quotient space. Then the weak 
topology aiW') on W with respect to T\y/i coincides with a{W)\w', and the weak topology 
a{W) on W with respect to the quotient topology induced by T is the quotient topology induced 
bya{W). 

Theorem D.l (Theorem 4.2 in [18, Chap. II]). Let W be a l.c.t.v.s., W a vector subspace of 
W, and f G Cc{W'). Then f extends to an J e Cc{W). 

Note that the weak topology aiV) on V with respect to the original topology T of F is 
weaker than or equal to T, therefore S is also compact with respect to criV). Now we have the 
following property: 

Lemma D.l. We have (t{V)\v = T{V) = (j{V,C%{V)). 

Proof. Note that a{V)\v C a{V,C%{V)) since T\v = T{V) by the defiiiition of V. Thus it 
suffices to show that each / € C}g{V) is continuous with respect to a"(F)|y. Now this / is 
(T|v')-continuous since T\v = T(y), therefore Theorem ID. II implies that / extends to a T- 
continuous g € CiV). This g is also cr(y)-continuous by the definition of C7(F), therefore 
/ = g\v is continuous with respect to (T(y)|y, as desired. Hence the lemma holds. □ 

In what follows, continuity of a map from V is considered with respect to ct{V) instead 
of T unless otherwise specified. Let Vq denote the intersection of the kernels ker(/) of all 
/ € Cciy). Let vr denote the quotient map V — > V/Vq, and let T = Tr{a{V)) denote the 
quotient topology on 7r{V) induced by cr{V). Note that for any / G CdV), there exists a unique 
/ G Cc{tt{V)) such that / = /ovr, and any element CdT^iV)) is obtained in this manner. Thus 
by Proposition ID. H the topology T of 7r{V) is a weak topology and coincides with a{-K{V), J^) 
where = {f \ f £ Cc{V)}, therefore 7r{V) is a l.c.t.v.s. that is Hausdorff by the definition of 
T^{V). Note that iriV) is a linear subspace of ir{V) and vr(5) is convex in ir{V). Similarly, vr(5) 
is also convex in vr(y), and tt{S) is compact since S is compact and vr is continuous. On the 
other hand, since a{V) C T, V is T-dense in V and S is (T|^)-dense in S, it follows that V is 
also c'"(y)-dense in V and S is also {a{V)\g)-dense in <S, therefore 7r{V) is dense in niy) and 
tt{S) is dense in 7t{S) since tt is continuous. Moreover, we have the following two properties: 

Lemma D.2. We have f\^(v) = o"(7r(V'), >C^(5)(vr(y))). 

Proof. Since T\^(y^ is a weak topology by Proposition ID.H it suffices to show that an / G 
/:(7r(y)) is (T|^(y))-continuous if and only if / G £^(5)(7r(y)). First, let / G £^(5)(7r(y)). Then 
/ o 7r|y G £^(y), therefore / o 7r|y G Cc{V) by the definition of the topology of V . By Lemma 
ID.H / o 7r[y is also ((T(y)|y)-continuous. Thus Theorem ID . 1 1 implies that / o 7r|y extends to a 
g G Cciy). Take the 'g G Ccii^iV)) corresponding to g. Then we have g{Tr{v)) = g{v) = f{Tr{v)) 
for any v £ V, therefore 'g\Tr(v) = f ■ Thus / is (T|7r(y))-continuous. 

Secondly, let / G /3(7r(y)) that is (T|7r(y))-continuous. Then by Theorem lD.il this / extends 

to a (7 G £c(7r(F)). Now goix G £c(^)) therefore B = goTr{S) is bounded in M since S is compact. 
Moreover, we have /(7r(s)) = g{'7T{s)) G B for each s G S, therefore /(7r(5)) C -B is also bounded 
in M. Thus we have / G /:^(^)(7r(F)). Hence Lemma D holds. □ 

Lemma D.3. 7r|y is a bijection from V to tt{V). 
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Proof. Let v and v' be distinct elements of V. Then, since V is Hausdorff by Lemma IB. II 
and the topology of F is a weak topology, there exists an / € Cc{V) such that f{v) / fW). 
Now Lemma iD.ll and Theorem ID. II imply that this / extends to a g £ Cc{V), and we have 
g{v) 7^ g{v'). Thus v — v' ^ Vq and Tr{v) / ^{v'). Hence the lemma holds. □ 

By Lemma Id. 2( Lemma [D.3( and the definition of T(V), the map 7r|y is an isomorphism of 
t.v.s. from V to Tr{V). Moreover, 7r|5 : S is also an isomorphism of convex structures. 

The current situation is summarized as follows: 

• 7t{V) is a I.e. Hausdorff t.v.s. with a weak topology; 

• tt{V) is a topological vector subspace of 7r(y), with induced topology equal to (7(7r(y), >C^j.^^(7r(y))), 
that is dense in Tr{V); 

• 7t{S) ~ 5o is a convex subset of tt{V) that contains the origin of vr(y) and satisfies 
Aff(7r(5)) = tt{V), with the relative topology a{TT{S),£{Tr{S))); 

• vr(5) is the closure of it{S) in iriV) that is convex and compact. 

Note that the above objects vr(5), it{V), Tr{S), and ir{V) will be the desired objects in 
Theorem 12.11 if the affine hull of vr(5) coincides with vr(y). However, this is not necessarily 
guaranteed in general. Instead, we take a linear subspace W = Aff(7r(5)) of niV) (note that 
7r(5) contains the origin of 7r(V)). Then W is also a I.e. Hausdorff t.v.s., and the topology of 
W is also a weak topology by Proposition lD.il This W contains tt{V) since Tr{V) = Aff(7r(5)), 
and 7r{V) is dense in W since it is dense in tt{V). On the other hand, vr(5) is also the compact 
closure of vr(iS) in W since it{S) C W. Moreover, by taking the completion X of the Hausdorff 
uniform space vr(5), the compact subset Tr{S) of the Hausdorff space X is closed in X, therefore 
X = clx(vr(5)) = Tr{S) and 7r(5) itself is complete. Thus the objects tt{S), 7r(l/), 7r(5), and 
W play the roles of S, V, S, and V in Theorem 12. 1| respectively. Hence the existence of the 
objects in Theorem 12. II is proven. 

E Uniqueness of the Objects in Theorem 12.11 

Finally, we prove the uniqueness of the objects in Theorem 12.11 (in the sense specified in the 
statement). Let {S,V,S,V) and {S' ,V' ,S' ,V') be two collections of the objects as in the 
statement. First, since iS ~ 5o — 5', there exists an affine isomorphism / : S — > 5'. Since 
V = Aff(5) and V = Aff(5'), this / extends to an affine isomorphism V V, denoted also 
by / (thus f{S) = S'). Now note that the topology T{V) of V is also the weakest topology to 
make every affine functional g on V, such that g{S) is bounded in M, a continuous map. The 
same also holds for V' . Moreover, for each affine functional g on V, g{S) is bounded if and only 
if 5 o f^^[S') is bounded. Thus it follows from the above properties of T{V) and T{V') that 
the affine isomorphism f : V ^ V is also a homeomorphism of topological spaces. 

From now, we show that this f : V ^ V' extends to the map V ^ V specified in Theorem 
12.11 For the purpose, take the completions W and W of V and of V, respectively (cf.. Appendix 
ICl) . Then W is also a Hausdorff t.v.s. and contains V (hence V) as a dense topological vector 
subspace. The same also holds for W and V. Since W and W are complete, V is dense in 
W, and V' is dense in W', it follows that the above homeomorphism f : V ^ V extends to a 
homeomorphism W — > W, denoted also by /. Now we have the following: 

Lemma E.l. The above map f : W ^ W is also an affine isomorphism. 
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Proof. It suffices to show that / preserves the convex combination of two elements. Let A, /i > 
such that A + /x = 1. Then for each v,v' £ V, we have Xf{v) + fif{v') = f{\v + fiv') 
since /|y : V ^ V is affine. This imphes that the two maps gi{v,v') = A/(u) + fJ-f^v') and 
g2(v,v') = f{Xv + fiv') from V x V to W coincide with each other. Since V x V is dense 
in X and W is complete, the continuous map gi = 92 '■ V x V ^ W has a unique 
continuous extension W x W ^ W'. On the other hand, both ^{w, w') = \f{w) + ^f{w') and 
^(■w, w') = f{\w + ^w') are continuous maps from x to W and satisfy that 'gi\vxv = 91 
and 'g2\vxv = 92- This imphes that 'gT = '92, therefore f{\w + fiw') = Xf{w) + ^f{w') for any 
vj.,w' G W . Hence the lemma holds. □ 

Since S = cly(iS) is compact, S is also closed in W, therefore c\w{S) = S. Similarly, we 
have c\w'{S') = S' . Since f : W ^ W is a homeomorphism and /(<S) = <S', we have f{S) = S'. 
Moreover, since f : W ^ W is an affine isomorphism, V = Aff(5), and V' = Aff(<S'), we have 
f{V) = v. Thus f\y : V ^ V is the desired map specified in Theorem 12. li Hence the proof 
of Theorem 12.11 is concluded. 
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